Web search activity data accurately predict population chronic disease risk in the USA.
نویسندگان
چکیده
BACKGROUND The WHO framework for non-communicable disease (NCD) describes risks and outcomes comprising the majority of the global burden of disease. These factors are complex and interact at biological, behavioural, environmental and policy levels presenting challenges for population monitoring and intervention evaluation. This paper explores the utility of machine learning methods applied to population-level web search activity behaviour as a proxy for chronic disease risk factors. METHODS Web activity output for each element of the WHO's Causes of NCD framework was used as a basis for identifying relevant web search activity from 2004 to 2013 for the USA. Multiple linear regression models with regularisation were used to generate predictive algorithms, mapping web search activity to Centers for Disease Control and Prevention (CDC) measured risk factor/disease prevalence. Predictions for subsequent target years not included in the model derivation were tested against CDC data from population surveys using Pearson correlation and Spearman's r. RESULTS For 2011 and 2012, predicted prevalence was very strongly correlated with measured risk data ranging from fruits and vegetables consumed (r=0.81; 95% CI 0.68 to 0.89) to alcohol consumption (r=0.96; 95% CI 0.93 to 0.98). Mean difference between predicted and measured differences by State ranged from 0.03 to 2.16. Spearman's r for state-wise predicted versus measured prevalence varied from 0.82 to 0.93. CONCLUSIONS The high predictive validity of web search activity for NCD risk has potential to provide real-time information on population risk during policy implementation and other population-level NCD prevention efforts.
منابع مشابه
Prevalence and Risk Factors for Chronic Kidney Disease in Family Relatives of a Cameroonian Population of Hemodialysis Patients: A Cross-Sectional Study
Background: In sub-Saharan Africa (SSA), the trend in the number of patients admitted for maintenance hemodialysis is on the rise. The identification of risk factors for chronic kidney disease (CKD) ensures adequate primary and secondary preventive measures geared at reducing the burden of CKD in low-resource settings. A family history of CKD is an established risk factor for C...
متن کاملAssociation between IL6 gene polymorphism and the risk of chronic obstructive pulmonary disease in the north Indian population
Interleukin-6 (IL6) is encoded by the IL6 gene in human and acts as pro-inflammatory cytokine and an anti-inflammatory cytokine. Recent studies established that IL6 substantially contribute in the diagnosed of systemic inflammation for the patients suffering from lung diseases such as chronic obstructive pulmonary disease (COPD). Thereof, this work aimed to investigate the prot...
متن کاملبهترین شاخص تن سنجی برای پیشگویی عوامل خطر بیماری های قلبی عروقی در مردان ساکن منطقه 13 تهران
Background: Stenotic coronary arteryIt is essential to identify the best simple anthropometric index in any population to predict chronic disease risk. This study was designed to compare the ability of waist circumference (WC), body mass index (BMI) and waist-to-hip ratio (WHR) to predict cardiovascular risk factors in an urban adult population of Tehranian men. Materials and Methods: This po...
متن کاملAn Ensemble Click Model for Web Document Ranking
Annually, web search engine providers spend more and more money on documents ranking in search engines result pages (SERP). Click models provide advantageous information for ranking documents in SERPs through modeling interactions among users and search engines. Here, three modules are employed to create a hybrid click model; the first module is a PGM-based click model, the second module in a d...
متن کاملContext-based Query Prediction to Populate a Personal Linked Data Cache
Search engines aim to meet users’ information needs by indexing vast corpora of Web documents and selecting candidate results based on keyword queries. However, the emergence of a Web of Data [7] enables new forms of application that require expressive query access, for which mature, Web-scale information retrieval techniques may not be suited. Rather than attempting to deliver expressive query...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of epidemiology and community health
دوره 69 7 شماره
صفحات -
تاریخ انتشار 2015